refactor(risk): dedup SQL category classifier via unnest arrays by mfbx9da4 · Pull Request #2987 · speakeasy-api/gram

mfbx9da4 · 2026-05-21T19:12:31Z

Why

Follow-up to #2976. The (source, rule_id) -> category mapping lived in four duplicated CASE expressions in server/internal/risk/queries.sql (~40 lines each, ~160 lines total) and in a separate TypeScript copy in the dashboard, even though the Go classifier in internal/risk/categories already owns the canonical mapping. Adding or rebucketing a rule meant editing the Go classifier AND four SQL blocks AND the dashboard's risk-utils.ts in lockstep, with no compile-time check that they agreed.

An earlier iteration of this PR tried a session-scoped TEMP TABLE populated via a pgxpool AfterConnect hook. It worked, but bound a runtime concern (connection bootstrap) to a build concern (sqlc static analysis), and required a sqlc_schema.sql stub plus matching hook in cmd/gram and the test pool. Switched to passing the classifier in directly as query parameters, no schema declarations or connection hooks.

What changed

Server: classifier handed in as five parallel arrays

internal/risk/categories.SQLRows() projects Definitions into five parallel arrays sized 1-to-1 by row:

priority, category, source, ruleID, rulePrefix := categories.SQLRows()

Each caller in internal/risk/impl.go passes these as @cat_priority::int[], @cat_category::text[], etc.

All four affected queries (ListRiskUserCategoryBreakdown, ListRiskRulesByCategory, ListRiskOverviewTimeSeriesFindings, ListRiskResultsByProjectFound) now open with:

WITH risk_category_lookup AS (
  SELECT unnest(@cat_priority::int[])    AS priority,
         unnest(@cat_category::text[])   AS category,
         unnest(@cat_source::text[])     AS source,
         unnest(@cat_rule_id::text[])    AS rule_id,
         unnest(@cat_rule_prefix::text[]) AS rule_prefix
)

and resolve categories per row via:

COALESCE(
  (SELECT rcl.category FROM risk_category_lookup rcl
   WHERE (rcl.source     != '' AND rcl.source     = rr.source)
      OR (rcl.rule_id    != '' AND rcl.rule_id    = rr.rule_id)
      OR (rcl.rule_prefix != '' AND rr.rule_id LIKE rcl.rule_prefix || '%')
   ORDER BY rcl.priority ASC
   LIMIT 1),
  'custom'
)::text

Empty-string sentinels (instead of NULL) because the Go classifier emits one row per definition with the irrelevant fields blanked out; this keeps the WHERE clause uniform without three-valued logic.

Dashboard: classifier reads from /rpc/risk.categories

risk-utils.ts used to maintain its own SOURCE_TO_CATEGORY map plus a DETECTION_RULES-derived rule_id lookup, a parallel copy of the Go classifier. Replaced with useFindingClassifier(), a hook backed by the existing useRiskCategories() React Query binding. CategoryLabel renders nothing during the first fetch and falls back to "custom" for unmapped rules once data is loaded. DETECTION_RULES stays only as the source of per-rule human-readable titles (the API exposes the classification but not titles).

What's gone

internal/risk/categories/bootstrap.go (the AfterConnect hook)
internal/risk/categories/sqlc_schema.sql (the sqlc-only schema stub)
The pgxpool.Config.AfterConnect = categories.BootstrapConnection lines in server/cmd/gram/deps.go and server/internal/testenv/postgresql.go
The Filter struct and FilterFor() helper (leftovers from an even earlier design)
The categories.Filter exhaustruct exclusion
The dashboard's SOURCE_TO_CATEGORY map and RULE_ID_TO_CATEGORY lookup

Net change

queries.sql: still ~140 lines lighter than the duplicated-CASE world; the four 30-50 line CASE blocks are gone
Dashboard: classifier mapping no longer ships as static data; it follows the Go source via the API
No new connection hooks, no schema stubs, no migrations
Adding a new rule or moving one between categories: edit Go Definitions once, restart, dashboard picks it up automatically

Before

CASE
  WHEN rr.source IN ('shadow_mcp', 'destructive_tool', 'cli_destructive', 'prompt_injection') THEN rr.source
  WHEN rr.rule_id LIKE 'secret.%' THEN 'secrets'
  WHEN rr.rule_id IN ('pii.credit_card', 'pii.iban_code', 'pii.us_bank_number', 'pii.crypto') THEN 'financial'
  WHEN rr.rule_id IN ( 'pii.us_ssn', 'pii.us_passport', ..., 'pii.sg_nric_fin' ) THEN 'government_ids'
  ...
  ELSE 'custom'
END AS category   -- × 4 queries

// risk-utils.ts (parallel copy)
const SOURCE_TO_CATEGORY = new Map([["gitleaks", "secrets"], ["presidio", "pii"], ...])
for (const [category, rules] of Object.entries(DETECTION_RULES)) {
  for (const rule of rules) ruleIdToCategory.set(rule.id, category)
}

After

WITH risk_category_lookup AS (SELECT unnest(...) ...)
SELECT ..., COALESCE((SELECT rcl.category FROM risk_category_lookup rcl WHERE ... LIMIT 1), 'custom')::text AS category

// risk-utils.ts
const { data } = useRiskCategories(...)
// classify per row using data.categories

and one Go file with the canonical Definitions slice driving both.

Query performance

EXPLAIN ANALYZE on the local dev DB (575 chats / 38,632 chat_messages / 4,851 findings; busiest project = 2,410 findings):

Query	Before	After	Notes
`ListRiskRulesByCategory` (category='pii', 7d)	3.2 ms	~8.7 ms	Same regression vs the inline-CASE baseline as the temp-table version (subquery scans the ~14-row unnested set per result row). Plan shape identical to the temp-table approach.
Other risk queries	unchanged plan shape	unchanged	The subquery only fires for rows that pass the project + window bitmap.

The lookup is materialized once per query via the CTE; PostgreSQL plans the seq-scan against a 14-row in-memory tuple stream so there is no IO cost. Linear in the result set, not in the lookup-table size.

Test coverage

Existing risk integration tests (results_test.go, resultsbychat_test.go, listgetdelete_test.go, overview_test.go) all pass.
The Go classifier unit test (categories_test.go) is unchanged and still pins the canonical behaviour.
Dashboard pnpm build and pnpm lint pass.

Follow-up to #2976. Four duplicated CASE expressions in server/internal/risk/queries.sql shared a 40-line categorisation block that the Go classifier in internal/risk/categories already owned. Replace them with a single subquery that joins risk_results against a session-scoped TEMP TABLE risk_category_lookup populated from the canonical Go classifier on every pool connection. How it works: - internal/risk/categories.BootstrapConnection runs on AfterConnect for every pgxpool connection (cmd/gram/deps.go and the testenv pool helper for risk_results tests). It creates risk_category_lookup if needed, truncates it, and inserts rows derived from Definitions. - All four risk queries now resolve the category via: (SELECT rcl.category FROM risk_category_lookup rcl WHERE source-match OR rule_id-match OR LIKE prefix ORDER BY priority LIMIT 1) -- with COALESCE to 'custom' - internal/risk/categories/sqlc_schema.sql declares the temp table for sqlc's static analysis (not migrated; runtime owns table creation). Result: one source of truth. Adding or moving a rule means editing Definitions and restarting the server; SQL never has to be touched. Perf: ListRiskRulesByCategory goes from ~3.2ms to ~8.7ms on the local dev DB (2,410 findings) because of the SubPlan that scans the 14-row lookup table per result row. Well within budget; see PR description for the EXPLAIN ANALYZE comparison.

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

vercel · 2026-05-21T19:12:37Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
gram-docs-redirect	Ready	Preview, Comment	May 22, 2026 7:43am

changeset-bot · 2026-05-21T19:12:37Z

⚠️ No Changeset found

Latest commit: cc4fd53

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

speakeasybot · 2026-05-21T19:17:23Z

🚀 Preview Environment (PR #2987)

Preview URL: https://pr-2987.dev.getgram.ai

Component	Status	Details	Updated (UTC)
✅ Database	Ready	Existing database reused	2026-05-22 10:50:58.
✅ Images	Available	Container images ready	2026-05-22 10:49:46.

_{Gram Preview Bot}

The previous commit introduced a per-connection TEMP TABLE (risk_category_lookup) populated by a pgxpool AfterConnect hook so that the four risk-overview queries could share one category classifier without each carrying its own CASE expression. That worked but bound a runtime concern (connection bootstrap) to a build concern (sqlc static analysis), and it required a sqlc_schema.sql stub plus matching bootstrap code in cmd/gram/deps.go and internal/testenv. Switch to passing the classifier as five parallel arrays (priority, category, source, rule_id, rule_prefix) per query call. Each query builds the same risk_category_lookup CTE inline via parallel unnest() calls. No connection hook, no schema stub, no AfterConnect hook in tests; the Go classifier in internal/risk/categories remains the single source of truth. Drops the unused Filter type and FilterFor() helper that were leftovers from the previous per-query CASE design.

The dashboard's risk-utils held its own (source, rule_id) -> category classifier built from SOURCE_TO_CATEGORY and DETECTION_RULES. That was a parallel copy of the Go classifier in internal/risk/categories, guaranteed to drift whenever a rule was added on one side and not the other. Replace it with useFindingClassifier(), a hook that reads the canonical classifier through the existing useRiskCategories() React Query binding (/rpc/risk.categories, shipped in #2976). CategoryLabel renders nothing during the first fetch and falls back to "custom" for unmapped rules once data is loaded. Per-rule human-readable titles still come from DETECTION_RULES since the API does not expose them.

mfbx9da4 requested a review from a team as a code owner May 21, 2026 19:12

claude Bot reviewed May 21, 2026

View reviewed changes

github-actions Bot added the preview Spawn a preview environment label May 21, 2026

vercel Bot deployed to Preview May 21, 2026 21:47 View deployment

mfbx9da4 changed the title ~~feat(risk): dedup category classifier via per-connection temp table~~ refactor(risk): dedup SQL category classifier via unnest arrays May 21, 2026

mfbx9da4 requested a review from a team as a code owner May 22, 2026 07:43

vercel Bot deployed to Preview May 22, 2026 07:43 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(risk): dedup SQL category classifier via unnest arrays#2987

refactor(risk): dedup SQL category classifier via unnest arrays#2987
mfbx9da4 wants to merge 3 commits into
mainfrom
da/risk-category-classifier-sql-dedup

mfbx9da4 commented May 21, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 21, 2026 •

edited

Loading

Uh oh!

speakeasybot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mfbx9da4 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Server: classifier handed in as five parallel arrays

Dashboard: classifier reads from /rpc/risk.categories

What's gone

Net change

Before

After

Query performance

Test coverage

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

speakeasybot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Preview Environment (PR #2987)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mfbx9da4 commented May 21, 2026 •

edited

Loading

vercel Bot commented May 21, 2026 •

edited

Loading

changeset-bot Bot commented May 21, 2026 •

edited

Loading

speakeasybot commented May 21, 2026 •

edited

Loading